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Abstract — Manifold learning is a hot research topic in the 
field of computer science. A crucial issue with current manifold 
learning methods is that they lack a natural quantitative measure 
to assess the quality of learned embeddings, which greatly limits 
their applications to real-world problems. In this paper, a new 
embedding quality assessment method for manifold learning, 
named as Normalization Independent Embedding Quality Assess- 
ment (NIEQA), is proposed. Compared with current assessment 
methods which are limited to isometric embeddings, the NIEQA 
method has a much larger application range due to two features. 
First, it is based on a new measure which can effectively 
evaluate how well local neighborhood geometry is preserved 
under normalization, hence it can be applied to both isometric 
and normalized embeddings. Second, it can provide both local 
and global evaluations to output an overall assessment. Therefore, 
NIEQA can serve as a natural tool in model selection and 
evaluation tasks for manifold learning. Experimental results on 
benchmark data sets validate the effectiveness of the proposed 
method. 

Index Terms — Nonlinear Dimensionality reduction, Manifold 
learning, Data analysis 

I. Introduction 

ALONG with the advance of techniques to collect and 
store large sets of high-dimensional data, how to effi- 
ciently process such data issues a challenge for many fields in 
computer science, such as pattern recognition, visual under- 
standing and data mining. The key problem is caused by "the 
curse of dimensionality" that is, in handling with such 
data the computational complexities of algorithms often go up 
exponentially with the dimension. 

The main approach to address this issue is to perform 
dimensionality reduction. Classical linear methods, such as 
Principal Component Analysis (PC A) (2), (3) and Multidi- 
mensional Scaling (MDS) |4], achieve their success under 
the assumption that data lie in a linear subspace. However, 
such assumption may not usually hold and a more realistic 
assumption is that data lie on or close to a low-dimensional 
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manifold embedded in the high-dimensional ambient space. 
Recently, many methods have been proposed to efficiently 
find meaningful low-dimensional embeddings from manifold- 
modeled data, and they form a family of dimensionality 
reduction methods called manifold learning. Representative 
methods include Locally Linear Embedding (LLE) (5), (6), 
ISOMAP 0, (5J, Laplacian Eigenmap (LE) (9), (TO), Hessian 
LLE (HLLE) (IT), Diffusion Maps (DM) (12), (13), Local 
Tangent Space Alignment (LTSA) (14), Maximum Variance 
Unfolding (MVU) [15], and Riemannian Manifold Learning 
(RML) (16). 

Manifold learning methods have drawn great research inter- 
ests due to their nonlinear nature, simple intuition, and com- 
putational simplicity. They also have many successful appli- 
cations, such as motion detection (TT) , sample preprocessing 
(T8) , gait analysis (19), facial expression recognition (20), 
hyperspectral imagery processing (21), and visual tracking 
(22). 

Despite the above success, a crucial issue with current 
manifold learning methods is that they lack a natural measure 
to assess the quality of learned embeddings. In supervised 
learning tasks such as classification, the classification rate can 
be directly obtained through label information and used as a 
natural tool to evaluate the performance of the classifier. How- 
ever, manifold learning methods are fully unsupervised and 
the intrinsic degrees of freedom underlying high-dimensional 
data are unknown. Therefore, after training process, we can 
not directly assess the quality of the learned embedding. As 
a consequence, model selection and model evaluation are 
infeasible. Although visual inspection on the embedding may 
be an intuitive and qualitative assessment, it can not provide 
a quantitative evaluation. Moreover, it can not be used for 
embeddings whose dimensions are larger than three. 

Recently, several approaches have been proposed to address 
the issue of embedding quality assessment for manifold learn- 
ing, which can be cast into tow categories by their motivations. 

• Methods based on evaluating how well the rank of neigh- 
bor samples, according to pairwise Euclidean distances, 
is preserved within each local neighborhood. 

• Methods based on evaluating how well each local neigh- 
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borhood matches its corresponding embedding under 
rigid motion. 

These methods are proved to be useful to isometric manifold 
learning methods, such as ISOMAP and RML. However, a 
large variety of manifold learning methods output normalized 
embeddings, such as LLE, HLLE, LE, LTSA and MVU, just to 
name a few. In these method, embeddings have unit variance 
up to a global scale factor. Then the distance rank of neighbor 
samples is disturbed in the embedding as pairwise Euclidean 
distances are no longer preserved. Meanwhile, anisotropic co- 
ordinate scaling caused by normalization can not be recovered 
by rigid motion. As a consequence, existent methods would 
report false quality assessments for normalized embeddings. 

In this paper, we first propose a new measure, named 
Anisotropic Scaling Independent Measure (ASIM), which can 
efficiently compare the similarity between two configurations 
under rigid motion and anisotropic coordinate scaling. Then 
based on ASIM, we propose a novel embedding quality assess- 
ment method, named Normalization Independent Embedding 
Quality Assessment (NIEQA), which can efficiently assess the 
quality of normalized embeddings quantitatively. The NIEQA 
method owns three characteristics. 

1) NIEQA can be applied to both isometric and normalized 
embeddings. Since NIEQA uses ASIM to assess the sim- 
ilarity between patches in high-dimensional input space 
and their corresponding low-dimensional embeddings, 
the distortion caused by normalization can be eliminated. 
Then even if the aspect ratio of a learned embedding 
is scaled, NIEQA can still give faithful evaluation of 
how well the geometric structure of data manifold is 
preserved. 

2) NIEQA can provide both local and global assessments. 
NIEQA consists of two components for embedding 
quality assessment, a global one and a local one. The 
global assessment evaluates how well the skeleton of a 
data manifold, represented by a set of landmark points, is 
preserved, while the local assessment evaluates how well 
local neighborhoods are preserved. Therefore, NIEQA 
can provide an overall evaluation. 

3) NIEQA can serve as a natural tool for model selection 
and evaluation tasks. Using NIEQA to provide quanti- 
tative evaluations on learned embeddings, we can select 
optimal parameters for a specific method and compare 
the performance among different methods. 

In order to evaluate the performance of NIEQA, we conduct 
a series of experiments on benchmark data sets, including both 
synthetic and real-world data. Experimental results on these 
data sets validate the effectiveness of the proposed method. 



TABLE I 
Main notations. 



X 
X 

Xi 
Xi 

Vi 

y 

Y 

y^ 

Yi 



n-dimensional Euclidean space where 

high-dimensional data samples lie 

m-dimensional Euclidean space, m < n, where 

low-dimensional embeddings lie 

The i-th data sample in R n , i = 1, 2, . . . , N 

X = {xi,X2, . . . ,xn} 

X = [xi X2 • • • xn], n x N data matrix 

Xi = {xi 1 , Xi 2 , . . . , Xi k }, local neighborhood of xi 

Xi = [xi 1 Xi 2 • • • Xi k ], n x k data matrix 

The index set of the k nearest neighbors of x% in X 

low-dimensional embedding of Xi, i = 1, 2, . . . , N 



■,V n} 
2/jv], m x N data matrix 
. . . , yi k }, low-dimensional embedding 



y = {2/1,2/2, • 
Y = [yi 2/2 • 
yi = {Vii > Vk 

of Xi 

Yi = [yi 1 yi 2 • • • yi k ], m x k data matrix 
The index set of the k nearest neighbors of m in y 
e = [1 1 ••• 1] T , k dimensional column vector 
of all ones 

Identity matrix of size k 
L2 norm for a vector 
Frobenius norm for a matrix 



The rest of the paper is organized as follows. A liter- 
ature review on related works is presented in Section [TT| 
The Anisotropic Scaling Independent Measure (ASIM) is 



described in Section III Then the Normalization Independent 
Embedding Quality Assessment (NIEQA) method is depicted 
in Section [IVJ Experimental results are reported in Section 
|V| Some concluding remarks as well as outlooks for future 



research are given in Section VI 



II. Literature review on related works 

In this section, the current state-of-the-art on embedding 
quality assessment methods are reviewed. For convenience and 
clarity of presentation, main notations used in this paper are 
summarized in Table [l| Throughout the whole paper, all data 
samples are in the form of column vectors. The superscript of 
a data vector is the index of its component. 

According to motivation and application range, existent 
embedding quality assessment methods can be categorized 
into two groups: local approaches and global approaches. 
Related works in the two categories are reviewed respectively 
as follows. 

A. Local approaches 

Goldberg and Ritov |23| proposed the Procrustes Measure 
(PM) that enables quantitative comparison of outputs of iso- 
metric manifold learning methods. For each X { and y^ their 
method first uses Procrustes analysis p4|-p6| to find an 
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optimal rigid motion transformation, consisting of a rotation 
and a translation, after which 3^ best matches X{. Then the 
local similarity is computed as 

k 

L(x i ,Y i ) = J2\\xi j -Ryi 3 -b\\l , 

3=1 

where R and t are the optimal rotation matrix and translation 
vector, respectively. Finally, the assessment is is given by 

1 N 

2=1 

where H k = h ~ e k e£. 

An Mp close to zero suggests a faithful embedding. Re- 
ported experimental results show that the PM method provides 
good estimation of embedding quality for isometric methods 
such as ISOMAP. However, as pointed out by the authors, PM 
is not suitable for normalized embedding since the geometric 
structure of every local neighborhood is distorted by normal- 
ization. Although a modified version of PM is proposed in 
(23), which eliminates global scaling of each neighborhood, it 
still can not address the issue of sperate scaling of coordinates 
in the low-dimensional embedding. 

Besides the PM method, a series of works follow the line 
that a faithful embedding would yield a high degree of overlap 
between the neighbor sets of a data sample and of its corre- 
sponding embedding. Several works are proposed by using 
different ways to define the overlap degree. A representative 
one is the LC meta-criteria (LCMC) proposed by Chen and 
Buja (27), (28), which can serve as a diagnostic tool for 
measuring local adequacy of learned embedding. The LCMC 
assessment is defined as the sum of local overlap degree and 
given by 

1 N 

Mlc = ^J2\^i) nAf k (yi)\ • (2) 

i—l 

Venna and Kaski (29) proposed an assessment method 
which consists of two measures, one for trustworthiness and 
one for continuity, based on the change of indices of neighbor 
samples in R n and R m according to pairwise Euclidean 
distances, respectively. Aguirre et a/.proposed an alternative 
approach for quantifying the embedding quality, by evaluating 
the possible overlaps in the low-dimensional embedding. Their 
assessment is used for automatic choice of the number of 
nearest neighbors for LLE (30) and also exploited in (3T) to 
evaluate the embedding quality of LLE with optimal regu- 
larization parameter. Akkucuk and Carroll 1 32 ] independently 
developed the Agreement Rate (AR) metric which shares the 
same form to Mlc- Based on AR, they suggested another 
useful assessment method called corrected agreement rate, by 



randomly reorganize the indices of data in y. Also with AR, 
France and Carroll (33) proposed a method using the RAND 
index to evaluate dimensionality reduction methods. 

Lee and Verleysen (34), (35) proposed a general frame- 
work, named co-ranking matrix, for rank-based criteria. The 
aforementioned methods, which are based on distance ranking 
of local neighborhoods, can all be cast into this unified 
framework. The block structure of the co-ranking matrix also 
provides an intuitive way to visualize the differences between 
distinct methods. In (36) , they further extended their work to 
circumvent the global scale dependency. 

The above assessments based on overlap degrees of neigh- 
borhoods are implemented in the same way: an embedding 
with good quality corresponds to a high value of the assess- 
ment. They work well for isometric embeddings since pairwise 
distances within each neighborhood are preserved. However, 
when the embedding is normalized, the neighborhood structure 
is distorted since pairwise distances are no longer kept. The 
overlap degree would be much lower than expected even if 
the embedding is of high quality under visual inspection. 

B. Global approaches 

Tenenbaum et al. (7) suggested to use the residual variance 
as a diagnostic measure to evaluate the embedding quality. 
Given X and y, the residual variance is computed by 

M RV = l-p 2 (G Xl D Y )) , (3) 

where p(Gx,Dy) is the standard linear correlation coeffi- 
cients taken over all entries of Gx and Dy. Here Gx(hj) 
is the approximated geodesic distance between xi and Xj (7) 
and Dy(i,j) = \\yi — yj\\2- A low value of Mrv close to 
zero indicates a good equality of the embedding. 

The Mrv measure was applied to choose the dimension of 
learned embedding for ISOMAP (7) and the optimal parameter 
for LLE (37) . Nevertheless, for a normalized embedding the 
geodesic distances are no longer preserved and the reliability 
of Mrv m ay decrease in such case. 

Dollar et al. (38) proposed a supervised method for model 
evaluation problem of manifold learning. They assume that 
there is a very large ground truth data set containing the 
training data. Pairwise geodesic distances are approximated 
within this set using ISOMAP, and the assessment is defined as 
the average error between pairwise Euclidean distances in the 
embedding and corresponding geodesic distances. However, in 
real situations we do not usually have such ground truth set 
and their assessment can not be used in general cases. 

Recently, Meng et al. proposed a new quality assessment 
criterion to encode both local-neighborhood-preserving and 
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global- structure-holding performances for manifold learning. 
In their method, a shortest path tree is first constructed from 
the fc-NN neighborhood graph of training data. Then the 
global assessment is computed by using Spearman's rank 
order correlation coefficient defined on the rankings of branch 
lengths. Finally, the overall assessment is defined to be a 
linear combination of the global assessment and Mlc- In their 
work, normalization is treated as a negative aspect in quality 
assessment, while our work is to define a new assessment 
which is independent of normalization. 

III. ASIM: Anisotropic Scaling Independent 
Measure 

In this section, we introduce a novel measure, named 
Anisotropic Scaling Independent Measure (ASIM), which can 
effectively evaluate the similarity between two configurations 
under rigid motion and anisotropic coordinate scaling. A syn- 



thetic example is first given in Subsection III-A to demonstrate 
why existent assessments fail under normalization. Then the 
motivation and overall description of ASIM are presented in 



Subsection |III-B| Finally, the computational details are stated 
in Subsection IIII-CI 

A. A synthetic example 

We randomly generate 100 points within the area [—2,2] x 



m 



which form the input data set X 



{#i,#2, • • • 7^100 }• Next we normalize X to get output data 
y such that YY T = I 2 , which are taken as the embedding of 
X. In fact, X can be obtained from y through a rotation and 
anisotropic coordinate scaling, that is, X = RSY where 

_ /-0.9991 0.0434\ / 11.6414 \ 

~ y 0.0434 0.0991 J ' ~ \ 5.6236 J 

In Fig. [TJ a )> Xi, i = 1, 2, . . . , 100 are marked with blue dots 
and the 10 nearest neighbors of the origin in X are marked 
with blue circles. In Fig.[TJb), = 1, 2, ... , 100 are marked 
with red dots and the 10 nearest neighbors of the origin in y 
are marked with red squares. Meanwhile, the corresponding 
embeddings of the 10 nearest neighbors of the origin in X are 
marked with blue circles. From Fig. [TJb) we can see that the 
neighborhood of the origin change a lot after normalization. 
Only 6 nearest neighbors are still in the neighborhood after 
normalization and the overlap degree is only 60%. Meanwhile, 
we also compute the Procrustes measure Mp between X and 
y and show it in Fig.[TJb). After normalization, Mp is as high 
as 0.8054. 

Through this synthetic example, we can clearly observe 
the distortion on Mp and local neighborhood overlap degree 
caused by normalization. 
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(a) 



M P = 0.8054 



O 



(b) 

Fig. 1. A synthetic example where existent assessments fail, (a) Input data 
X, marked by blue dots, (b) Normalized embedding y, marked by red dots. 
Black filled square: the origin (0, 0) T . Blue circles: the k nearest neighbors 
of the origin in X and their corresponding embeddings. Red squares: the k 
nearest neighbors of the origin in y. 



B. Motivation and description of ASIM 

Since a manifold is a topological space which is locally 
equivalent to a Euclidean subspace, an embedding would be 
faithful if it preserves the structure of local neighborhoods. 
Then we face a question that how to define the "preservation" 
of local neighborhood structure. 

Under the assumption that the data manifold is dense, each 
local neighborhood Xi can be roughly viewed as a linear 
subspace embedded in the ambient space. Considering possible 
normalization on y, a rational and reasonable choice is to 
define a new measure which can efficiently assess the simi- 
larity between Xi and y^ under rigid motion and anisotropic 
coordinate scaling. 

Formally, for each index i, we assume that there exists a 
rigid motion and anisotropic coordinate scaling between X { 
and 3^. Since a rigid motion can be decomposed into a rotation 
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and a translation, then for any G Xi we assume that 

Xij = PiD lVl3 + U , (4) 

where G M nxm is orthogonal, that is, P? Pi = I m . Di is a 
diagonal matrix of rank m and ^ G W 1 stands for an arbitrary 
translation. 

To evaluate how similar Xi and ^ are, our goal is to find 
optimal P* , D* and t* such that best matches under Eq. 
([?]). Equivalently, we need to solve the following constrained 
optimization problem 



min ELi IK " i? A T »ij " *; 



till 



(5) 



A G V(m) 

where P(m) is the set of all diagonal matrices of rank m. 

Then the neighborhood "preservation" degree can be defined 
as the sum of squared distances between corresponding sam- 
ples in Xi and 3^ under the above transformation. Formally, 
the anisotropic scaling independent measure (ASIM) is defined 
as follows 

k k 

M asim (XuYi) = ]T IK - p*D* yij - t*\\l/Y, IK Hi , 

3 = 1 3 = 1 

(6) 

or in matrix form 

M asim (Xi,Yi) = \\Xi - P*D*Yi - t*el\\ 2 F /\\Xi\\ 2 F , (7) 

where the normalization item in denominator is introduced to 
eliminate arbitrary scaling. 

C Computation of ASIM 

The optimization problem Eq. ([5} does not admit a closed- 
form solution. Alternatively, we use gradient descent method 
to solve Eq. ([5]). Note that all n x m orthogonal matrices 
form the so-called Stiefel manifold, which is a Riemannian 
submanifold embedded in R nm . We denote this manifold 
by St(n,m). Also note that V(m) is closed for matrix 
addition, multiplication and scalar multiplication, hence V(m) 
is homeomorphic to R m . Then Eq. ^ can be resolved by 
using gradient descent method over matrix manifolds. 

For convenience of presentation, we first introduce the S 
operator (39), which is defined as follows 

Definition 1. When the 5 operator is defined on a Tri- 
dimensional vector v = (^1,^2,-'' ?^n) T > S(v) ia a n x n 
diagonal matrix whose diagonal entries are just components 
of v, that is 

fvi \ 

l>2 



S(v) 



When the S operator is defined on a n x n square matrix 
A = (cbij), 6(A) is a n-dimensional vector formed by the 
diagonal entries of v, that is, 

6(A) = (an,a 2 2, ■ ■ ■ ,«nn) T • 
The 6 operator can be compounded, which yields 

6 2 (v) = v 



6\A) 



/an 



\ 



CL22 



\ 



With the above notations, Eq. §5§ now can be rewritten in 
matrix form as 

min \\Xi-PiDiYi-Uel\\ 2 F 

P^,D ^ ,t ^ 

s.t. Pi G St(n,m), DieV m . (8) 

Next, we solve Eq. ([8} in three steps, which are described 
respectively as follows. 

1) Computation t*: Let Li = PiDi and note that for any 
matrix A, \\A\\^p = tr (A T A). Then the objective function can 
be written as 

f{Li,U) =tr ((Xi-LiYi-tiel) T (Xi-LiYi-tiel)) . 

(9) 

By using the propositions of matrix trace Eq. ^ can be 
expanded as 

f(Li,U) = tr (xTXi)+ti (YfLjLiYi)- 
2tr (YiXfLi)+tv (t z ele k tj)- 
2tT(e T k Xjti)+tv(e T k YTLjti) . (10) 

Taking derivative with resect to U yields 

df(L u U) 



2kU - 2Xie k + 2LiYiU 



Since f(Li,ti) is a strict convex function of U, then by making 
both sides of the above equation to be zero, we can get the 
optimal solution to U as follows 



t* = ^(Xi - PiDiYi)e k . 



(11) 



Substitute t* into Eq. d8|), and the latter one is rewritten as 



min \\Xi - PiDiYi\\ 2 F 

Pi,Di 



s.t. Pi G St(n,m), DieV m , (12) 
where Xi = Xi(I k - \e k e\) and % = Yi(I k - \e k e\). 
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2) Computation ofD*: In the second step, we compute the 

- - T 

optimal solution D* to A with respect to P$. Let A\ = YiXi 
and Bi = P T XiYi , and denote the objective function in Eq. 
^ by f{P h Di). Then we have 

/(Pi, A) = tr (D 2 A z )-2ti (2?A)+tr (I.lf) 

m m 

= E4 } (4 l) ) 2 - 2 E^f+^(^ T ). 



where a^, b^ and c?y ; are the j-th diagonal entries of Ai, 
Bi and A> respectively. 

Since a$ > 0, j = 1, 2, . . . , m, / is a convex function 
of vector 5(A)- Taking partial derivative with respect to c^- 
(j = 1, 2, . . . , m) and by making them to be zero, we can 
get the global optimal solutions to Sy* (j = 1, 2, . . . , m) as 
follows 

6 (0 

df = ^f- j = l,2,...,m . 

Then D* is given by 

D* = {5\A l ))- 1 8 2 {B l ) . (13) 

Substituting Eq. ( [T3] ) into / yields 

/(Pi) = tr {A i {{5 2 {A i ))- 1 5 2 {B i )) 2 ) - 

2tr {{5 2 (A i ))- 1 5 2 {B i )B i ) +tr (M T ) 

tr (XiXi T ) 



i=i i a jj ; 


m 


33 
(*) 


i=i 




m / 6 (i)x2 

(i) ^ 


- tr (XiXi T ) . 









Let Mi = X^^ 2 ^))- 1 / 2 , then /(Pi) can be rewritten 



as 



= - tr {\[Pf Mi) {Pf Mi)) + tr (X,X, T ) , 

where P^ and M$. are the j-th columns of matrices Pi 
and respectively. stands for the Hadamard product 



over matrices. The optimization problem Eq. ( |T2| ) can be 
transformed into 



max (j)(Pi) = tr ((P/M*) (P[ M^) 
s. t. Pi G St(n,m) . 



(14) 



5 J Computation of P* : In the third step, we use gradient 
descent method over matrix manifold to solve Eq. ( [T4| ), which 
is an optimization problem for matrix function over the Stiefel 
manifold St(n,m). 

Denote the gradient of <j) in R nm by V0(P$) and the 
gradient of (j) on St(n, m) by V(/>(Pi), then by the proposition 



of Stiefel manifold |40], V0(P*) is the projection of V0(P») 
onto the tangential space at Pi and can be computed by the 
following formula 

if V0(P,) + (V0(P,)) T P* 



V0(Pi) = V0(Pi)-Pi- 



(15) 



Now all we need is to compute V0(P$). Let F(Pi) = 
(PfMi) (PTMi). From matrix calculus, the differentiation 
of with respect to Pi is 



^(P z ) = (vec/ m ) T DP(P 2 ) 



(16) 



where the vec operator reformulates a n x m matrix into a Tri- 
dimensional vector by stacking its columns one underneath 
other. 

Next we derive DF(Pi). First, we have 

dF(P z ) = 2(M?P l )®(M?dP i ) = 2Wl{{MfPi)®{MfdPi))W rn , 

where stands for the Kronecker product over matrices and 
W rn = (vec wiWi , vec w^wT^, • • • , vec w rn w^ n ) is an m 2 x rn 
matrix. w iy i = 1, 2, . . . , m is an m-dimensional vector who 
has 1 in its z-th component and elsewhere. Then we have 

vecdF(P z ) = 2vec{Wl((M?Pi)®{M?dP i ))W ni ) 

= 2(W£ Wm) vec(Mf P (MfdPi)) 

= 2(WT®W m )(Hi®I m )vec(MldPi) 

= 2{Wl^W m ){Hi^I rn ){I m ^Mf)dYecPi , 

where A = ((J m if mm )((vec Mf Pi) J m )) I m . Here 
if mm is a permutation matrix of order m 2 , and for any square 
matrix M of order m, K mrn vec M = vecM T . Then by 
matrix calculus (4TJ, we have 

DF(P,) = 2(Wl W m )(A Im)(Im ® Mf ) . 

Furthermore, through algebraic deduction and Eq. ( [T6| ), we 
have 

D0(P<) = (vec A) T DF(P,) = 2vec(Mi5 2 (P?Mi)) T . 
Then V0(P$) is given by the following formula 
V^(P,) = 2M l 8 2 {P^M i ) , 
and by using Eq. ( [15] ), V0(Pi) now reads 

V0(Pi) = 2MiS 2 (Pf Mi )—PiP r [5 2 (Pj Mi )—PS 2 {Pf Mi)Mj Pi . 

(17) 

Given a step length for iteration, we apply gradient descent 
method to find P* such that V0(P^) vanishes. In each itera- 
tion, we first update Pi as 

Pi = Pi+aV^(Pi) . 

Then we retract Pi to St(n,m). From the property of 
St(n,m), such retraction can be obtained through the QR 
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Algorithm 1: Anisotropic Scaling Independent Measure 
(ASIM), M asirn . 

Input: Local neighborhood matrix Xi and corresponding 
embedding matrix Yi, number of nearest neighbors 
k, step length for iteration a, and threshold e for 
stopping criterion. 

Output: M a8 im(Xi,Yi). 

Step 1. Assign Xi = Xi(I k — e^e^ ). 

Step 2. Assign % = Yi(I k - e k el). 

Step 3. Set initial value i^ (0) for Pi. 



Step 4. Use Eq. dl7k to compute V(/>(i^ 0) ) 



Step 5. If ||V0(pJ^)||f < e, goto Step 6; otherwise, do 

^ Vi> (0) + «v^ (0) ). 

Compute the QR decomposition of p/°\ P/ ^ = QiPi- 
Let P^ (0) <- Qi and goto Step 5. 

Step 6. Let P* = P^ (0) and use Eq. ^ to compute D*. 
Step 7. Use Eq. ( pTj ) to compute t*. Step 8. Compute 
Mq^pQ, Y^) through Eq. 0. 



decomposition of P^. Let Pi = Q^P^, where Qi G St(n,m) 
and Pi is an upper- triangular matrix. The retraction of P^ to 
S^n, m) is just Qi- 

In each iteration, we use Qi to update Pi until ||V0(Pi)|| j p 
is less than a given threshold e. After P* is computed, D* 
can be given by Eq. ([13]), and the optimal value to Eq. ( [T2| ) is 

/(p*,A*)- 

4) 77z£ algorithm and discussion: Finally, we summarize 
the computation process of M asirn in Algorithm [T] 

When the dimension n of input samples is very high, 
performing QR decomposition of Pi in each iteration will 
greatly increase of computational complexity of Algorithm 
[T] A possible solution to this issue is first projecting Xi to 
its tangential space, denoted as TXi, and then computing 
M as i m (TXi,Yi). When data are densely distributed on the 
manifold, TXi can optimally recover the local linear structure 
of a manifold. Therefore, such strategy is feasible. The tan- 
gential space can be approximated by using PCA, MDS or the 
method proposed in (42). 



IV. Normalization independent embedding quality 

ASSESSMENT 

When assessing the quality of embeddings, we need to 
consider both local and global evaluations. This leads to two 
issues. 

• Does the embedding preserve the global topology of the 
manifold? 



• Does the embedding preserve the geometric structure of 
local neighbor neighborhoods? 

In this section, we propose Normalization Independent Em- 
bedding Quality Assessment method (NIEQA) to address these 
two issues, which is independent of normalization. NIEQA is 
based on the ASIM measure stated in Section [TTT] and consists 
of two assessments, a local one and a global one. In the 
following subsections, we introduce these two assessments 
respectively as well as how NIEQA can be implemented in 
model selection and model evaluation. 

A. Local assessment 

For local neighborhood Xi on a data manifold and its cor- 
responding low-dimensional embedding y^ the local measure 
M asirn defined in last section characterizes how well local 
neighborhood structure is preserved and is independent of 
normalization. Therefore, we define the local assessment as 
the mean value of M as i m (Xi, Yi) over index i, that is, 

1 N 

M L (X,Y) = -^^m(X u Y t ) . (18) 

i=l 

B. Global assessment 

From geometric intuition, if an embedding preserves the 
global topology of the data manifold well, then such embed- 
ding should preserve relative positions among "representative" 
samples on the manifold. In other words, if we treat these "rep- 
resentative" samples as a local neighborhood, where pairwise 
Euclidean distances among neighborhood samples are replaced 
with pairwise geodesic distances on the manifold, then a good 
embedding should preserve the geometric structure of this 
neighborhood. 

Motivated by the above consideration, we define the global 
assessment as the matching degree between the aforemen- 
tioned described neighborhood and its corresponding embed- 
ding under rigid motion and anisotropic coordinate scaling. 

The computation of the global assessment consists of three 
steps, which are depicted below, respectively. 

1) Selecting landmark points. First, for each training 
sample find its ki nearest neighbors. Treat xi as 
a node in a graph and add edges among neighboring 
samples with edge length being pairwise Euclidean 
distance. Through such construction we get a connected 
graph. Then we use the shortest path length between xi 
and Xj to approximate the geodesic distance between 
them for all i and j. Next, we count how many shortest 
paths going through each X{ and record this number 
as its importance degree. Finally, the top 10% most 
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important data samples are selected as landmark points 
on the manifold and the set they formed is denoted by 
Xi. 

2) Computing 3^. Once X\ is fixed in the first step, the dis- 
tance between any two landmark points is defined to be 
the approximated geodesic distance. Then we implement 
MDS [4] to X\ to obtain its isometric embedding y, 
which optimally preserve relative positions of landmark 
points on the manifold. Note that the dimensions of y 
and yi are equal, and the latter one is the subset in y 
corresponding to X\. 

3) Computing the global assessment. We define the 
global assessment M G to be the ASIM measure between 
y and y t 



TABLE II 

Description of experimental data sets. 



M G (X,Y) = M asirn (Y h Y l ) , 



(19) 



where Y\ and Y\ are the m x I data matrices correspond- 
ing to y and y^ respectively. 

Remark 1. During landmark points selection, the parameter 
ki needs to be set manually. Based on experimental experience, 
setting ki = 0.1 A 7 " can yield a connected graph that approxi- 
mates the manifold structure well. However, if the graph is 
disconnected under current k\, k\ should be set to be the 
smallest integer which makes the graph fully connected. 

The landmark points selection method stated above has 
intuitive geometric motivation and is easy to implement. It 
can also be replaced with other more accurate yet more 
complicated approaches, for example, the methods proposed 
in f43\l and fi44\l. 



C. Implementation in model evaluation and model selection 

In this subsection, we state how to implement the NIEQA 
method to model evaluation and model selection for manifold 
learning. 

• Model evaluation. Given X, suppose that we have two 
embeddings, namely Y\ and Y2, obtained by different 
manifold learning methods. Then we say that Y\ owns 
better locality preservation than Y2 if Ml(X,Yl) < 
Ml{X 1 Y 2 ) and vice versa. We say Y\ owns better 
global topology preservation than Yi if Mq(X, Yi) < 
Mq(X, Y2) and vice versa. 

• Model selection. Given X and a set of parameters 
*P — {PiiP2, • • • :Pi}, for each parameter we compute 
its corresponding embedding yM using specific manifold 
learning method. Then we use Mq or Ml or their 
combination, which depends on the user's demand, to 
evaluate the quality of yW. Finally, the pi corresponding 



Data manifold 


N 


n 


m 


Description 


Swissroll 


1000 


3 


2 


Surface isometrically 










embedded in R 3 


Swisshole 


1000 


3 


2 


Surface embedded 
in R 3 


Gaussian 


1000 


3 


2 


Surface isometrically 










embedded in R 3 


lief ace 


1493 


560 


2 


Face manifold with 
resolution 28 x 20 



TABLE III 
Notations used in experiments. 



Notation 


Description 


M P 
M c p 
M LC 
M RV 


Procrustes measure (Eq. |lj) [23| 
Mp with global scaling removed 
LCMC measure (Eq. {2}) [27], |28 
Residual Variance measure (Eq. d3 


23] 
' @ 


M L 
M G 


Local assessment of NIEQA (Eq. ( 
Global assessment of NIEQA (Eq. 


l M 
TT9I) 


M t 


Matching degree between 3^ and 
ground truth U, M asim (Y, U) 



to the lowest assessment score is chosen to be the optimal 
parameter. 

V. Experiments 

In this section, the effectiveness of the NIEQA method is 
validated through a series of experimental tests on benchmark 
data sets. In Subsection V-A| NIEQA is applied to model 
evaluation for manifold learning. In Subsection |V-A| NIEQA is 
used to select optimal parameters for the LTSA method which 
outputs normalized embeddings. In experiments, NIEQA is 
compared with three commonly used assessment methods. We 
compute 1 — Mlc instead Mlc to obtain a unified criterion, 
that is, a small assessment value close to zero indicates good 
quality of the embedding. The benchmark data sets used in 
experiments are briefly depicted in Table [TT] and notations for 
methods are summarized in Table [nil 

A. Model evaluation 

In the first experiment, we apply NIEQA to model evalua- 
tion of the Swissroll manifold with parameter equation 

^1 



We use LLE [5], LE [10], LTSA [14], ISOMAP Q and 
RML [16] to learn this manifold, respectively. 1000 training 
samples are randomly generated and the number of nearest 
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neighbors is 10. Figs. [2] (c)-(g) shows the results of manifold 
learning, where X and U stands for the training data and the 
groundtruth of intrinsic degrees of freedom, respectively. By 
visual inspection, the embeddings given by LTSA and RML 
are the most similar to U. The one given by ISOMAP is a 
litter worse, and the one learned by LLE has a great change 
in global shape. LE fails to recover the geometric structure of 
U. 

For embeddings given by the above methods, we compute 
the different assessments described in Table [ill] and use bar 
plots to visualize their values in Figs. [2jh)-(m). From the 
bar plots, we can see that Mp only works for isometric 
embeddings given by ISOMAP and RML while reports false 
high values for normalized embeddings learned by LTSA and 
LLE. Although Mp eliminates the affects of global scaling, 
only the scale of Mp is normalized and it still reports false 
high values for normalized embeddings. Mlc and Mrv fails 
to output reasonable equality evaluations. It should be noted 
that Mrv is originally designed for the ISOMAP method and 
hence works well for the embedding given by ISOMAP. 

The two assessments Ml and M G in NIEQA provide 
overall and reasonable evaluations on embedding quality for 
various methods. Ml shows that LTSA and RML best pre- 
serve local neighborhood. LLE and ISOMAP perform worse, 
and LE performs the worst. Mq further indicates that the 
global- shape-preservation of the embedding given by LLE is 
not good. This completely matches visual inspection, which 
demonstrates that NIEQA can effectively evaluate the quality 
of both isometric and normalized embeddings. 

Besides, the bar plot of the matching degree M t between 
y and U is shown in Fig. |2jn). We can see that only Ml and 
Mq match M t , which validates the effectiveness of NIEQA. 

Similar to the first experiment, we apply NIEQA to model 
evaluation of the Swiss hole manifold, which shares the 
same parameter equation to Swissroll. The difference is 
that the set of intrinsic degree of freedoms U is no longer 
a convex set, where a rectangular region in U is digged out. 
Therefore, Swisshole manifold is geodesic non-connected. 
1000 training samples are randomly generated from the mani- 
fold and the number of nearest neighbors A: is 10. The learned 
low-dimensional embeddings and the bar plots of quality 
assessments are shown in Fig. [3] 

From Fig. [3] we can see that LTSA and RML correctly 
learned the geometric structure of U with the highest quality 
over other approaches. The embedding given by LLE has a 
distortion in global shape. ISOMAP and LE fails to learn the 
structure of hi. From the bar plots in Figs.[3](h)-(1), we can see 
that Ml reports a reasonable quality assessment and matches 



M t well which is illustrated in Fig. [3Jm). Mp and Mp works 
only for isometric embeddings provided by ISOMAP and 
RML. Mlc and Mrv fails to report reasonable evaluations. 
Since Swisshole manifold is geodesic non-connected, us- 
ing shortest path length would fail to approximate geodesci 
distance. Therefore, we do not compute the global assessment 
M G in NIEQA. 

In the third experiment, we apply NIEQA to model evalu- 
ation of the Gaussian manifold, whose parameter equation 
is 

( l l 

x = u 

< x 2 = u 2 

k x 3 = (l/27r)exp{-(( W 1 ) 2 + ( W 2 ) 2 )/2} 

1000 training samples are randomly generated from the mani- 
fold and the number of nearest neighbors k is 10. Fig. [4] shows 
the learned low-dimensional embeddings as well as bar plots 
of different quality assessments. 

From Fig. |4j we can observe that except LE all the 
other methods successfully learned the geometric structure 
of this manifold, whilst the quality of the embedding given 
by ISOMAP is a litter worse. From Figs. [4] (h)-(m), we can 
see that Mp performs well in this case by eliminating the 
global scaling factor. This is due to the isotropic property 
of this manifold. Mlc reports correct evaluations but still 
leans against to RML. Mrv fails to assess the embeddings 
correctly. Both the two assessments in NIEQA successfully 
evaluate the quality of different embeddings and match M t 
well. Note that the Gaussian surface is isotropic, hence the 
measure Mp also works. However, for anisotropic surfaces 
like Swissroll and Swisshole, only removing global 
scaling wound not yield a reasonable assessment. 

In the next experiment, we apply NIEQA to model eval- 
uation tasks on the lie face data set, which is a high- 
dimensional image manifold. As the code of RML on high- 
dimensional data is not available, we do not test RML on this 
data set. The training data contain 1965 face images, and the 
intrinsic degrees of freedom are the angle of face orientation 
and the variation of facial emotion. We randomly select 1493 
images as training data such that the data graph constructed 
via ISOMAP is connected. We apple LLE, LE, ISOMAP and 
LTSA to learn this manifold with 15 nearest neighbors. The 
two dimensional embeddings learned by these methods and bar 
plots of the quality assessments given by different methods are 
shown in Fig. [5] 

From Fig. [5] we can see that the embedding given by 
LLE does not recover the change of face orientation. The 
other methods all successfully extract the two intrinsic degrees 
of freedom despite the difference in embedding shape. The 
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Fig. 2. Manifold learning results on Swissroll. (a) Training data X. (b) Groundtruth of intrinsic degrees of freedom hi. (c)-(g) Embeddings learned by 
various method. The name of each method is stated below each subfigure. (h)-(n) Bar plots of different assessments on learned embeddings. The lower-case 
character under each bar corresponds to the index of the subfigure above. 



above visual inspection is also validated by the bar plots of 
quantitative assessments shown in Figs. [5je)-(i). M^c, Mrv 
and Ml all suggest that the quality of the embdding given by 
LLE is poor, while the others are almost of the same quality. 
Mlc and Ml indicate that the embedding given by LTSA is 
of the highest quality. Mp and Mp fail in this case. 

Remark 2. In experiments on high-dimensional image man- 
ifold, we did not compute Mq. The reason lies in that the 
computation of Mq needs to estimate geodesic distances based 
on shortest graph paths. However, we have no prior knowledge 



on the underlying geometric structure of image manifolds, 
hence using Mq to assess the global topology would yield 
unknown bias. Also note that the values of intrinsic degrees of 
freedom for image manifolds are unknown, hence we do not 
compute M t either. 

B. Model selection 

In this subsection, we take the LTSA method as an example 
to demonstrate the application of NIEQA to model selection 
task. The most important parameter for LTSA is the number 
of nearest neighbors k. We first apply NIEQA to selecting k 
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Fig. 3. Manifold learning results on Swisshole. (a) Training data X. (b) Groundtruth of intrinsic degrees of freedom hi. (c)-(g) Embeddings learned by 
various method. The name of each method is stated below each sub figure, (h)-(m) Bar plots of different assessments on learned embeddings. The lower-case 
character under each bar corresponds to the index of the subfigure above. 



for LTSA on the Swissroll data set. Similar to the first 



experiment in Section V-A We randomly select 1000 samples 
from the Swissroll manifold as training data. The values 
of k are chosen to be integers from 5 to 24. For each k, an 
embedding is learned with LTSA, which are shown in Fig. [6] 
The assessments given by NIEQA corresponding to different 
values of k are shown in Fig.[5Ja). From the figure we can see 
that when k is taking values between 6 and 15, LTSA would 
produce embeddings with high quality. This observation is also 
supported by visual inspection from Fig.[6j which validates the 
effectiveness of the NIEQA method. 



In the second experiment, we apply NIEQA to select 
optimal k for LTSA on the lief ace data set. Training data 
are the same to those used in the experiment in Section 
|V-A Values of k are taken to be integers from 5 to 24. 
For each k, an embedding is learned with LTSA, which is 
shown in Fig. [7] Corresponding quality assessment given by 
NIEQA are illustrated in Fig. [8jb), from which we can see 
that the embedding corresponding to k = 14 is of the highest 
quality. We can also observe that when k > 8, the quality of 
embeddings improves along with the increase of k, which is 
also validated by visual inspections from Fig. [7] 
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Fig. 4. Manifold learning results on Gaussian, (a) Training data X. (b) Groundtruth of intrinsic degrees of freedom hi. (c)-(g) Embeddings learned by 
various method. The name of each method is stated below each subfigure. (h)-(n) Bar plots of different assessments on learned embeddings. The lower-case 
character under each bar corresponds to the index of the subfigure above. 



VI. Conclusions and discussions 

In this paper, we proposed a novel normalization indepen- 
dent embedding quality assessment (NIEQA) method for man- 
ifold learning, which has wider application range than current 
approaches. We first propose a new local measure, which can 
quantitatively evaluate how well local neighborhood structure 
is preserved under rigid motion and anisotropic coordinate 
scaling. Then the NIEQA method, which is designed based on 
this new measure, can effectively and quantitatively evaluate 
the quality of both isometric and normalized embeddings. Fur- 



thermore, the NIEQA method considers both local and global 
topology, thus it can yield an overall assessment. Experimental 
tests on benchmark data sets validate the effectiveness of the 
proposed method. 

Some discussions and possible improvements in future 
works are stated below. 

• The measure M asirn is computed by using gradient 
descent method on matrix manifold. Whether the solution 
converges to a global optima remains unproved and is the 
key part of our future works. Meanwhile, we will also 
consider how to design more efficient iteration method 
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Fig. 5. Manifold learning results on lief ace. (a)-(d) Embeddings learned by various methods. The name of each method is stated below each subfigure. 
(e)-(i) Bar plots of different assessments on learned embeddings. The lower-case character under each bar corresponds to the index of the subfigure above. 



14 



k=5 


k=6 


k=7 


k=8 






S 1 




1 1 




! 1 


k=9 


k=10 


k=11 


k=12 


« i 




■ 1 




I 1 




I 1 


k=13 


k=14 


k=15 


k=16 


1 1 




I ■ 




* i 






k=17 


k=18 


k=19 


k=20 
















k=21 


k=22 


k=23 


k=24 





























Fig. 6. Embeddings given by LTSA on Swiss roll data set with different values of k. 




(d) (e) 
Fig. 7. Embeddings given by LTSA on lief ace data set with different values of k. 



(f) 



15 



to accelerate convergence. 

• The NIEQA method is based on a local matching method- 
ology. Its basic assumption is that the manifold is densely 
sampled and training data strictly lie on the manifold. 
For data manifold with noise or outliers, the efficiency 
of NIEQA may be affected. A possible solution to this 
issue is to implement denoising or outlier removal process 
before training. 

• Based on NIEQA, whether we can design a manifold 
learning method with better learning performance is also 
one of our future works. 
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